Action Recognition with CAM visualization


Author: Wendell Hom
Dataset: Stanford-40 action
Architecture: VGG16

Input Image Size: 500x500x3
Initial Learning Rate: .0001, decrease by a factor of 10 if no improvement to validation loss

This notebook implements action recognition CNN using Inception-ResNet-v2, and includes a Global Average Pooling layer followed by a 40-way softmax which uses categorical crossentropy. The GAP layer makes it simple to visualize the class activation map in order to understand which parts of the image was significant in leading the model to the given prediction.

A few images downloaded from the web was also used to do inference in the wild at the bottom of this notebook.

In [1]:
import tensorflow as tf


# Check tensorflow version
print("Using Tensorflow %s\n" % (tf.__version__))

# Check to see if graphics card is doing OK memory-wise
!nvidia-smi
Using Tensorflow 2.0.0

Sat Mar 14 00:24:50 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   35C    P0    49W / 300W |      0MiB / 16130MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
In [2]:
%load_ext tensorboard
In [3]:
from tensorflow.keras.metrics import top_k_categorical_accuracy

def top5(y_true, y_pred):
    return top_k_categorical_accuracy(y_true, y_pred, k=5)
In [4]:
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.keras.callbacks import ModelCheckpoint

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import optimizers

from sklearn import metrics
import math
import numpy as np
from tensorflow.keras.models       import Model

Run this section if we are loading the previously trained model¶

from tensorflow.keras.models import load_model

from tensorflow.keras.applications.inception_resnet_v2 import InceptionResNetV2, preprocess_input

dependencies = { 'top5': top5 }

model =load_model('models/class_only/30_epochs.h5')

model.summary()

Create Model Using ImageNet Weights¶

In [5]:
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input

pixels = 500
# Input pixel dimensions.  All training and test examples will be resized to (pixel, pixel, 3)
conv_base = VGG16(weights='imagenet', include_top=False, input_shape=(pixels,pixels,3))

conv_base.trainable = False

train_datagen = ImageDataGenerator(rescale=1./255, horizontal_flip=True, rotation_range=10, zoom_range= [0.9,1.1])
test_datagen = ImageDataGenerator(rescale=1./255)
In [6]:
conv_base.summary()
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 500, 500, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 500, 500, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 500, 500, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 250, 250, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 250, 250, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 250, 250, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 125, 125, 128)     0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 125, 125, 256)     295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 125, 125, 256)     590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 125, 125, 256)     590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 62, 62, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 62, 62, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 62, 62, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 62, 62, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 31, 31, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 31, 31, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 31, 31, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 31, 31, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 15, 15, 512)       0         
=================================================================
Total params: 14,714,688
Trainable params: 0
Non-trainable params: 14,714,688
_________________________________________________________________
In [7]:
model = models.Sequential()
model.add(conv_base)
model.add(layers.Conv2D(1024, (3, 3), padding="same", strides=(1, 1), activation="relu", name="ClassConv"))
model.add(layers.GlobalAveragePooling2D(name="GAP"))
model.add(layers.Dense(40, activation="softmax", name="class"))
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg16 (Model)                (None, 15, 15, 512)       14714688  
_________________________________________________________________
ClassConv (Conv2D)           (None, 15, 15, 1024)      4719616   
_________________________________________________________________
GAP (GlobalAveragePooling2D) (None, 1024)              0         
_________________________________________________________________
class (Dense)                (None, 40)                41000     
=================================================================
Total params: 19,475,304
Trainable params: 4,760,616
Non-trainable params: 14,714,688
_________________________________________________________________
In [8]:
all_amp_layer_weights = model.layers[-1].get_weights()[0]
In [9]:
all_amp_layer_weights.shape
Out[9]:
(1024, 40)
In [10]:
cam_shape = tuple(model.get_layer("ClassConv").output.get_shape().as_list()[1:])
cam_shape
Out[10]:
(15, 15, 1024)

Create train and test generators¶

In [11]:
train_datagen = ImageDataGenerator(rescale=1./255, horizontal_flip=True, rotation_range=10, zoom_range= [0.9,1.1])
test_datagen = ImageDataGenerator(rescale=1./255)
In [12]:
BATCH_SIZE = 40


train_generator = train_datagen.flow_from_directory("Stanford40/train", batch_size=BATCH_SIZE, target_size=(pixels,pixels), class_mode = 'categorical')
valid_generator = test_datagen.flow_from_directory("Stanford40/test", batch_size=BATCH_SIZE, target_size=(pixels,pixels), class_mode = 'categorical', shuffle=False)

y_true = valid_generator.classes

train_m = len(train_generator.classes)
valid_m = len(valid_generator.classes)

mapping = dict()
for activity, idx in train_generator.class_indices.items():
    mapping[idx] = activity

train_steps = math.ceil(train_m/BATCH_SIZE)
valid_steps = math.ceil(valid_m/BATCH_SIZE)
Found 6693 images belonging to 40 classes.
Found 2839 images belonging to 40 classes.
In [13]:
from tensorflow.keras.callbacks import ReduceLROnPlateau


filepath = "models/class_only/checkpoints/epoch_{epoch:02d}-{val_loss:.2f}.h5"
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=False, mode='min')

#callback = tf.keras.callbacks.EarlyStopping(monitor='classification_loss', patience=5)

logdir = "models/class_only/logs"

tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
In [14]:
model.compile(loss='categorical_crossentropy',
              optimizer=optimizers.Adam(lr=1e-4), metrics = ['acc'])
In [15]:
prev_epochs = 0
epochs = 15
In [16]:
history = model.fit_generator(train_generator, steps_per_epoch=train_steps, initial_epoch=prev_epochs, epochs=epochs, validation_data=valid_generator, validation_steps=valid_steps, callbacks=[tensorboard_callback, checkpoint])
Epoch 1/15
167/168 [============================>.] - ETA: 2s - loss: 3.3403 - acc: 0.1481
Epoch 00001: val_loss improved from inf to 2.93598, saving model to models/class_only/checkpoints/epoch_01-2.94.h5
168/168 [==============================] - 528s 3s/step - loss: 3.3384 - acc: 0.1481 - val_loss: 2.9360 - val_acc: 0.2290
Epoch 2/15
167/168 [============================>.] - ETA: 2s - loss: 2.7692 - acc: 0.2803
Epoch 00002: val_loss improved from 2.93598 to 2.64807, saving model to models/class_only/checkpoints/epoch_02-2.65.h5
168/168 [==============================] - 493s 3s/step - loss: 2.7694 - acc: 0.2804 - val_loss: 2.6481 - val_acc: 0.3290
Epoch 3/15
167/168 [============================>.] - ETA: 2s - loss: 2.4928 - acc: 0.3544
Epoch 00003: val_loss improved from 2.64807 to 2.36931, saving model to models/class_only/checkpoints/epoch_03-2.37.h5
168/168 [==============================] - 495s 3s/step - loss: 2.4934 - acc: 0.3544 - val_loss: 2.3693 - val_acc: 0.3751
Epoch 4/15
167/168 [============================>.] - ETA: 2s - loss: 2.3220 - acc: 0.3890
Epoch 00004: val_loss improved from 2.36931 to 2.28809, saving model to models/class_only/checkpoints/epoch_04-2.29.h5
168/168 [==============================] - 493s 3s/step - loss: 2.3216 - acc: 0.3888 - val_loss: 2.2881 - val_acc: 0.4086
Epoch 5/15
167/168 [============================>.] - ETA: 2s - loss: 2.1892 - acc: 0.4252
Epoch 00005: val_loss improved from 2.28809 to 2.15276, saving model to models/class_only/checkpoints/epoch_05-2.15.h5
168/168 [==============================] - 495s 3s/step - loss: 2.1919 - acc: 0.4243 - val_loss: 2.1528 - val_acc: 0.4227
Epoch 6/15
167/168 [============================>.] - ETA: 2s - loss: 2.1066 - acc: 0.4356
Epoch 00006: val_loss improved from 2.15276 to 2.11389, saving model to models/class_only/checkpoints/epoch_06-2.11.h5
168/168 [==============================] - 494s 3s/step - loss: 2.1063 - acc: 0.4360 - val_loss: 2.1139 - val_acc: 0.4396
Epoch 7/15
167/168 [============================>.] - ETA: 2s - loss: 2.0090 - acc: 0.4646
Epoch 00007: val_loss improved from 2.11389 to 2.05107, saving model to models/class_only/checkpoints/epoch_07-2.05.h5
168/168 [==============================] - 493s 3s/step - loss: 2.0083 - acc: 0.4644 - val_loss: 2.0511 - val_acc: 0.4519
Epoch 8/15
167/168 [============================>.] - ETA: 2s - loss: 1.9537 - acc: 0.4784
Epoch 00008: val_loss improved from 2.05107 to 1.95444, saving model to models/class_only/checkpoints/epoch_08-1.95.h5
168/168 [==============================] - 498s 3s/step - loss: 1.9542 - acc: 0.4787 - val_loss: 1.9544 - val_acc: 0.4748
Epoch 9/15
167/168 [============================>.] - ETA: 2s - loss: 1.8916 - acc: 0.4902
Epoch 00009: val_loss improved from 1.95444 to 1.92468, saving model to models/class_only/checkpoints/epoch_09-1.92.h5
168/168 [==============================] - 500s 3s/step - loss: 1.8923 - acc: 0.4902 - val_loss: 1.9247 - val_acc: 0.4745
Epoch 10/15
167/168 [============================>.] - ETA: 2s - loss: 1.8360 - acc: 0.5050
Epoch 00010: val_loss improved from 1.92468 to 1.88265, saving model to models/class_only/checkpoints/epoch_10-1.88.h5
168/168 [==============================] - 495s 3s/step - loss: 1.8345 - acc: 0.5058 - val_loss: 1.8826 - val_acc: 0.4882
Epoch 11/15
167/168 [============================>.] - ETA: 2s - loss: 1.7959 - acc: 0.5184
Epoch 00011: val_loss improved from 1.88265 to 1.85657, saving model to models/class_only/checkpoints/epoch_11-1.86.h5
168/168 [==============================] - 494s 3s/step - loss: 1.7961 - acc: 0.5179 - val_loss: 1.8566 - val_acc: 0.4974
Epoch 12/15
167/168 [============================>.] - ETA: 2s - loss: 1.7529 - acc: 0.5342
Epoch 00012: val_loss improved from 1.85657 to 1.83488, saving model to models/class_only/checkpoints/epoch_12-1.83.h5
168/168 [==============================] - 495s 3s/step - loss: 1.7531 - acc: 0.5341 - val_loss: 1.8349 - val_acc: 0.5005
Epoch 13/15
167/168 [============================>.] - ETA: 2s - loss: 1.7182 - acc: 0.5306
Epoch 00013: val_loss improved from 1.83488 to 1.81823, saving model to models/class_only/checkpoints/epoch_13-1.82.h5
168/168 [==============================] - 437s 3s/step - loss: 1.7175 - acc: 0.5312 - val_loss: 1.8182 - val_acc: 0.5086
Epoch 14/15
167/168 [============================>.] - ETA: 2s - loss: 1.6835 - acc: 0.5396
Epoch 00014: val_loss improved from 1.81823 to 1.77688, saving model to models/class_only/checkpoints/epoch_14-1.78.h5
168/168 [==============================] - 437s 3s/step - loss: 1.6825 - acc: 0.5400 - val_loss: 1.7769 - val_acc: 0.5125
Epoch 15/15
167/168 [============================>.] - ETA: 2s - loss: 1.6556 - acc: 0.5465
Epoch 00015: val_loss improved from 1.77688 to 1.76848, saving model to models/class_only/checkpoints/epoch_15-1.77.h5
168/168 [==============================] - 436s 3s/step - loss: 1.6543 - acc: 0.5471 - val_loss: 1.7685 - val_acc: 0.5178
In [20]:
prev_epochs = 15
epochs = 45
In [21]:
history = model.fit_generator(train_generator, steps_per_epoch=train_steps, initial_epoch=prev_epochs, epochs=40, validation_data=valid_generator, validation_steps=valid_steps)
Epoch 16/40
168/168 [==============================] - 459s 3s/step - loss: 1.6283 - acc: 0.5546 - val_loss: 1.7455 - val_acc: 0.5273
Epoch 17/40
168/168 [==============================] - 436s 3s/step - loss: 1.6004 - acc: 0.5642 - val_loss: 1.7281 - val_acc: 0.5203
Epoch 18/40
168/168 [==============================] - 435s 3s/step - loss: 1.5683 - acc: 0.5706 - val_loss: 1.7061 - val_acc: 0.5259
Epoch 19/40
168/168 [==============================] - 434s 3s/step - loss: 1.5485 - acc: 0.5775 - val_loss: 1.6879 - val_acc: 0.5273
Epoch 20/40
168/168 [==============================] - 436s 3s/step - loss: 1.5424 - acc: 0.5718 - val_loss: 1.6554 - val_acc: 0.5537
Epoch 21/40
168/168 [==============================] - 436s 3s/step - loss: 1.4966 - acc: 0.5921 - val_loss: 1.6877 - val_acc: 0.5259
Epoch 22/40
168/168 [==============================] - 435s 3s/step - loss: 1.4698 - acc: 0.5954 - val_loss: 1.6821 - val_acc: 0.5421
Epoch 23/40
168/168 [==============================] - 435s 3s/step - loss: 1.4500 - acc: 0.6035 - val_loss: 1.6367 - val_acc: 0.5488
Epoch 24/40
168/168 [==============================] - 434s 3s/step - loss: 1.4420 - acc: 0.6081 - val_loss: 1.6218 - val_acc: 0.5520
Epoch 25/40
168/168 [==============================] - 435s 3s/step - loss: 1.4199 - acc: 0.6115 - val_loss: 1.6290 - val_acc: 0.5530
Epoch 26/40
168/168 [==============================] - 433s 3s/step - loss: 1.4010 - acc: 0.6118 - val_loss: 1.6136 - val_acc: 0.5488
Epoch 27/40
168/168 [==============================] - 434s 3s/step - loss: 1.3952 - acc: 0.6163 - val_loss: 1.6750 - val_acc: 0.5407
Epoch 28/40
168/168 [==============================] - 434s 3s/step - loss: 1.3704 - acc: 0.6205 - val_loss: 1.5823 - val_acc: 0.5664
Epoch 29/40
168/168 [==============================] - 433s 3s/step - loss: 1.3576 - acc: 0.6233 - val_loss: 1.5964 - val_acc: 0.5460
Epoch 30/40
168/168 [==============================] - 432s 3s/step - loss: 1.3444 - acc: 0.6323 - val_loss: 1.5852 - val_acc: 0.5650
Epoch 31/40
168/168 [==============================] - 436s 3s/step - loss: 1.3262 - acc: 0.6366 - val_loss: 1.5891 - val_acc: 0.5523
Epoch 32/40
168/168 [==============================] - 434s 3s/step - loss: 1.3162 - acc: 0.6305 - val_loss: 1.5541 - val_acc: 0.5671
Epoch 33/40
168/168 [==============================] - 433s 3s/step - loss: 1.2975 - acc: 0.6410 - val_loss: 1.5499 - val_acc: 0.5569
Epoch 34/40
168/168 [==============================] - 433s 3s/step - loss: 1.2820 - acc: 0.6483 - val_loss: 1.5934 - val_acc: 0.5565
Epoch 35/40
168/168 [==============================] - 432s 3s/step - loss: 1.2702 - acc: 0.6490 - val_loss: 1.5537 - val_acc: 0.5724
Epoch 36/40
168/168 [==============================] - 434s 3s/step - loss: 1.2588 - acc: 0.6561 - val_loss: 1.5774 - val_acc: 0.5629
Epoch 37/40
168/168 [==============================] - 435s 3s/step - loss: 1.2459 - acc: 0.6559 - val_loss: 1.5138 - val_acc: 0.5692
Epoch 38/40
168/168 [==============================] - 434s 3s/step - loss: 1.2220 - acc: 0.6582 - val_loss: 1.5201 - val_acc: 0.5675
Epoch 39/40
168/168 [==============================] - 435s 3s/step - loss: 1.2200 - acc: 0.6593 - val_loss: 1.5343 - val_acc: 0.5696
Epoch 40/40
168/168 [==============================] - 435s 3s/step - loss: 1.2132 - acc: 0.6661 - val_loss: 1.5239 - val_acc: 0.5734
In [27]:
model.compile(loss='categorical_crossentropy',
              optimizer=optimizers.Adam(lr=1e-5), metrics = ['acc'])
In [28]:
prev_epochs = 45
epochs = 60
In [29]:
history = model.fit_generator(train_generator, steps_per_epoch=train_steps, initial_epoch=prev_epochs, epochs=epochs, validation_data=valid_generator, validation_steps=valid_steps, callbacks=[tensorboard_callback, checkpoint])
Epoch 46/60
167/168 [============================>.] - ETA: 2s - loss: 1.1229 - acc: 0.6952
Epoch 00046: val_loss improved from 1.76848 to 1.46793, saving model to models/class_only/checkpoints/epoch_46-1.47.h5
168/168 [==============================] - 451s 3s/step - loss: 1.1229 - acc: 0.6952 - val_loss: 1.4679 - val_acc: 0.5900
Epoch 47/60
167/168 [============================>.] - ETA: 2s - loss: 1.1132 - acc: 0.7055
Epoch 00047: val_loss improved from 1.46793 to 1.46480, saving model to models/class_only/checkpoints/epoch_47-1.46.h5
168/168 [==============================] - 436s 3s/step - loss: 1.1128 - acc: 0.7058 - val_loss: 1.4648 - val_acc: 0.5956
Epoch 48/60
167/168 [============================>.] - ETA: 2s - loss: 1.1135 - acc: 0.7058
Epoch 00048: val_loss did not improve from 1.46480
168/168 [==============================] - 436s 3s/step - loss: 1.1122 - acc: 0.7061 - val_loss: 1.4720 - val_acc: 0.5879
Epoch 49/60
167/168 [============================>.] - ETA: 2s - loss: 1.1152 - acc: 0.7049
Epoch 00049: val_loss did not improve from 1.46480
168/168 [==============================] - 436s 3s/step - loss: 1.1151 - acc: 0.7048 - val_loss: 1.4711 - val_acc: 0.5911
Epoch 50/60
167/168 [============================>.] - ETA: 2s - loss: 1.1149 - acc: 0.7036
Epoch 00050: val_loss did not improve from 1.46480
168/168 [==============================] - 434s 3s/step - loss: 1.1152 - acc: 0.7034 - val_loss: 1.4667 - val_acc: 0.5939
Epoch 51/60
167/168 [============================>.] - ETA: 2s - loss: 1.1078 - acc: 0.7105
Epoch 00051: val_loss did not improve from 1.46480
168/168 [==============================] - 435s 3s/step - loss: 1.1074 - acc: 0.7104 - val_loss: 1.4677 - val_acc: 0.5879
Epoch 52/60
167/168 [============================>.] - ETA: 2s - loss: 1.1091 - acc: 0.7048
Epoch 00052: val_loss improved from 1.46480 to 1.46374, saving model to models/class_only/checkpoints/epoch_52-1.46.h5
168/168 [==============================] - 438s 3s/step - loss: 1.1098 - acc: 0.7045 - val_loss: 1.4637 - val_acc: 0.5903
Epoch 53/60
167/168 [============================>.] - ETA: 2s - loss: 1.1055 - acc: 0.7080
Epoch 00053: val_loss did not improve from 1.46374
168/168 [==============================] - 437s 3s/step - loss: 1.1073 - acc: 0.7072 - val_loss: 1.4669 - val_acc: 0.5935
Epoch 54/60
167/168 [============================>.] - ETA: 2s - loss: 1.1121 - acc: 0.7052
Epoch 00054: val_loss improved from 1.46374 to 1.46194, saving model to models/class_only/checkpoints/epoch_54-1.46.h5
168/168 [==============================] - 438s 3s/step - loss: 1.1125 - acc: 0.7051 - val_loss: 1.4619 - val_acc: 0.5900
Epoch 55/60
167/168 [============================>.] - ETA: 2s - loss: 1.1059 - acc: 0.7036
Epoch 00055: val_loss did not improve from 1.46194
168/168 [==============================] - 435s 3s/step - loss: 1.1058 - acc: 0.7034 - val_loss: 1.4679 - val_acc: 0.5956
Epoch 56/60
167/168 [============================>.] - ETA: 2s - loss: 1.1014 - acc: 0.7052
Epoch 00056: val_loss did not improve from 1.46194
168/168 [==============================] - 433s 3s/step - loss: 1.1021 - acc: 0.7052 - val_loss: 1.4623 - val_acc: 0.5960
Epoch 57/60
167/168 [============================>.] - ETA: 2s - loss: 1.0979 - acc: 0.7110
Epoch 00057: val_loss did not improve from 1.46194
168/168 [==============================] - 436s 3s/step - loss: 1.0987 - acc: 0.7110 - val_loss: 1.4645 - val_acc: 0.5935
Epoch 58/60
167/168 [============================>.] - ETA: 2s - loss: 1.1016 - acc: 0.7063
Epoch 00058: val_loss did not improve from 1.46194
168/168 [==============================] - 435s 3s/step - loss: 1.1016 - acc: 0.7063 - val_loss: 1.4633 - val_acc: 0.5946
Epoch 59/60
167/168 [============================>.] - ETA: 2s - loss: 1.1047 - acc: 0.7054
Epoch 00059: val_loss improved from 1.46194 to 1.45940, saving model to models/class_only/checkpoints/epoch_59-1.46.h5
168/168 [==============================] - 434s 3s/step - loss: 1.1045 - acc: 0.7054 - val_loss: 1.4594 - val_acc: 0.5946
Epoch 60/60
167/168 [============================>.] - ETA: 2s - loss: 1.0962 - acc: 0.7104
Epoch 00060: val_loss did not improve from 1.45940
168/168 [==============================] - 435s 3s/step - loss: 1.0964 - acc: 0.7097 - val_loss: 1.4628 - val_acc: 0.5886
In [ ]:
score = model.evaluate_generator(valid_generator, 142)
In [ ]:
score

Saving the Model¶

In [51]:
model.save("models/class_only/VGG16-60_epochs.h5")
In [18]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
inception_resnet_v2 (Model)  (None, 14, 14, 1536)      54336736  
_________________________________________________________________
ClassConv (Conv2D)           (None, 14, 14, 1024)      14156800  
_________________________________________________________________
GAP (GlobalAveragePooling2D) (None, 1024)              0         
_________________________________________________________________
class (Dense)                (None, 40)                41000     
=================================================================
Total params: 68,534,536
Trainable params: 14,197,800
Non-trainable params: 54,336,736
_________________________________________________________________

Plotting Loss and Acc¶

In [32]:
import matplotlib.pyplot as plt

plt.plot(hist['acc'])
plt.plot(hist['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'dev'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(hist['loss'])
plt.plot(hist['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'dev'], loc='upper left')
plt.show()
In [19]:
h = dict()
h['acc'] = history.history['acc']
h['val_acc'] = history.history['val_acc']
h['loss'] = history.history['loss']
h['val_loss'] = history.history['val_loss']
In [23]:
g = dict()
g['acc'] = history.history['acc']
g['val_acc'] = history.history['val_acc']
g['loss'] = history.history['loss']
g['val_loss'] = history.history['val_loss']
In [30]:
i = dict()
i['acc'] = history.history['acc']
i['val_acc'] = history.history['val_acc']
i['loss'] = history.history['loss']
i['val_loss'] = history.history['val_loss']
In [31]:
hist = dict()

for key in ['acc', 'val_acc', 'loss', 'val_loss']:
    hist[key] = h[key] + g[key] + i[key]

hist
Out[31]:
{'acc': [0.14806515,
  0.28044227,
  0.35440013,
  0.38876438,
  0.42432392,
  0.43597788,
  0.46436575,
  0.4787091,
  0.49021366,
  0.50575227,
  0.51785445,
  0.53414017,
  0.53115195,
  0.5399671,
  0.5471388,
  0.5546093,
  0.5641715,
  0.57059616,
  0.577469,
  0.5717914,
  0.5921112,
  0.5953982,
  0.60346633,
  0.60809803,
  0.6115344,
  0.6118333,
  0.61631554,
  0.620499,
  0.6233378,
  0.6323024,
  0.6366353,
  0.6305095,
  0.6409682,
  0.64828926,
  0.6490363,
  0.65605855,
  0.6559092,
  0.6581503,
  0.65934557,
  0.66606903,
  0.69520396,
  0.70581204,
  0.70611084,
  0.70476615,
  0.7034215,
  0.71044374,
  0.70446736,
  0.7071567,
  0.705065,
  0.7034215,
  0.7052144,
  0.7110414,
  0.70626026,
  0.7053638,
  0.7096967],
 'val_acc': [0.22895385,
  0.3289891,
  0.37513208,
  0.40859458,
  0.42268404,
  0.4395914,
  0.4519197,
  0.47481507,
  0.47446284,
  0.48820007,
  0.49735823,
  0.50052834,
  0.5086298,
  0.5125044,
  0.51778793,
  0.52729833,
  0.5202536,
  0.5258894,
  0.52729833,
  0.5537161,
  0.5258894,
  0.54209226,
  0.5487848,
  0.5519549,
  0.5530116,
  0.5487848,
  0.5406833,
  0.5663966,
  0.54596686,
  0.56498766,
  0.5523071,
  0.5671011,
  0.55688626,
  0.556534,
  0.57238466,
  0.56287426,
  0.5692145,
  0.5674533,
  0.5695667,
  0.5734413,
  0.58999646,
  0.59563226,
  0.58788306,
  0.5910532,
  0.59387106,
  0.58788306,
  0.5903487,
  0.59351885,
  0.58999646,
  0.59563226,
  0.5959845,
  0.59351885,
  0.5945756,
  0.5945756,
  0.5885875],
 'loss': [3.337895355790288,
  2.769995068178643,
  2.495402000204394,
  2.321324236596377,
  2.193093383702287,
  2.1061491715111895,
  2.0070674762799925,
  1.9541751817080288,
  1.8943065739917626,
  1.8329069024864981,
  1.795036632633879,
  1.7526963047964876,
  1.7180848388096222,
  1.681560832164969,
  1.6542462812630598,
  1.6281703483821492,
  1.6017447886900857,
  1.569664925180991,
  1.5506033639317935,
  1.5415068457483907,
  1.496098818077949,
  1.4684927267075725,
  1.4464351359315648,
  1.4410272471558865,
  1.4193528122241483,
  1.4007338715226558,
  1.393974223337098,
  1.372337036793246,
  1.3570948459456253,
  1.340602995336599,
  1.3262606140531696,
  1.316246300770993,
  1.2985011409363119,
  1.281522738707363,
  1.2700719964997609,
  1.2594986711936949,
  1.2464989693589084,
  1.2242580648051344,
  1.218915826854338,
  1.2133605781556887,
  1.1217790488685244,
  1.1101102652650199,
  1.1105868267439485,
  1.116327591181407,
  1.1156862300197123,
  1.1085332803496077,
  1.111082499949508,
  1.1069277351824813,
  1.1107803247228931,
  1.1067194859845817,
  1.1022634466328858,
  1.097851904308103,
  1.1026017454802215,
  1.1014816264266944,
  1.0942999145365895],
 'val_loss': [2.935978907934377,
  2.648071755825634,
  2.3693063720850875,
  2.2880870406056792,
  2.1527563400671514,
  2.113886870129008,
  2.051070167984761,
  1.954443741012627,
  1.9246799257439626,
  1.8826474480226005,
  1.8565712613119205,
  1.8348758178697506,
  1.8182275446367935,
  1.776876791262291,
  1.7684768849695232,
  1.7455237600165354,
  1.7281287723863628,
  1.7061195495262953,
  1.687864558797487,
  1.6554013054135819,
  1.6876549813109385,
  1.6820635497570038,
  1.6366636753082275,
  1.6218197253388418,
  1.6290285923111607,
  1.61364672805222,
  1.6750089165190576,
  1.5822633194251798,
  1.5964415728206365,
  1.585228788600841,
  1.5891252516860692,
  1.5540822668814323,
  1.5499430382755441,
  1.5933715549992844,
  1.553711065943812,
  1.5773657426028185,
  1.513758894423364,
  1.5200749711251595,
  1.5342660265069612,
  1.5238779795841433,
  1.4679263032657999,
  1.4647955726569808,
  1.4719903217235082,
  1.4710897450715723,
  1.4667060845334765,
  1.4677194096672703,
  1.463736674315493,
  1.4669159079941225,
  1.4619411515517973,
  1.4678915713874388,
  1.4623342609741319,
  1.464515917737719,
  1.4632994482215023,
  1.4593977281745052,
  1.4628121433123735]}

Confusion Matrix¶

In [33]:
cam_shape = tuple(model.get_layer("ClassConv").output.get_shape().as_list()[1:])


# custom generator
def multiple_outputs(generator, image_dir, batch_size, image_size):
    gen = generator.flow_from_directory(
        image_dir,
        target_size=(image_size, image_size),
        batch_size=batch_size,
        class_mode='categorical', shuffle=False)
    
    while True:
        gnext = gen.next()
        # return image batch and 3 sets of lables
        #yield gnext[0], [np.zeros((gnext[1].shape[0], 14, 14, 1024)), gnext[1]]
        yield gnext[0], [np.zeros((gnext[1].shape[0], *cam_shape)), gnext[1]]
In [34]:
BATCH_SIZE = 12

pixels = 500
train_datagen = ImageDataGenerator(rescale=1./255, horizontal_flip=True, rotation_range=10, zoom_range= [0.9,1.1])
test_datagen = ImageDataGenerator(rescale=1./255)

valid_generator = multiple_outputs(test_datagen, 
                                   image_dir="Stanford40/test", 
                                   batch_size=BATCH_SIZE, 
                                   image_size=pixels)

valid_temp = test_datagen.flow_from_directory("Stanford40/test", batch_size=BATCH_SIZE, target_size=(pixels,pixels), class_mode = 'categorical', shuffle=False)
y_true = valid_temp.classes

valid_m = len(valid_temp.classes)

mapping = dict()
for activity, idx in valid_temp.class_indices.items():
    mapping[idx] = activity


valid_steps = math.ceil(valid_m/BATCH_SIZE)
Found 2839 images belonging to 40 classes.
In [35]:
BATCH_SIZE = 12

valid_generator = test_datagen.flow_from_directory("Stanford40/test", batch_size=BATCH_SIZE, target_size=(pixels,pixels), class_mode = 'categorical', shuffle=False)

y_true = valid_generator.classes

valid_m = len(valid_generator.classes)

mapping = dict()
for activity, idx in train_generator.class_indices.items():
    mapping[idx] = activity

valid_steps = math.ceil(valid_m/BATCH_SIZE)
Found 2839 images belonging to 40 classes.
In [36]:
predictions = model.predict_generator(valid_generator, valid_steps)
In [37]:
predictions = predictions.argmax(axis=1)
In [38]:
len(predictions)
Out[38]:
2839
In [39]:
(y_true == predictions).mean()
Out[39]:
0.5885875308207115
In [40]:
mapping[5]
Out[40]:
'cooking'
In [41]:
matrix = metrics.confusion_matrix(y_true, predictions)

Plot confusion matrix¶

In [42]:
import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt

df_cm = pd.DataFrame(matrix, index = [mapping[i] for i in range(40)], columns = [mapping[i] for i in range(40)])
plt.figure(figsize = (40, 40))
sn.heatmap(df_cm, annot=True)
Out[42]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fa5fe221cc0>
In [43]:
import pickle
with open('VGG16-16_epochs-history.txt', 'wb') as file_pi:
    pickle.dump(hist, file_pi)

Predicting from own image¶

In [44]:
from tensorflow.keras.applications.inception_resnet_v2 import InceptionResNetV2, preprocess_input

def predict_image(file, model, mapping):
    
    # load an image from file
    image = load_img(file, target_size=(224, 224))
    image = img_to_array(image)
    image = image.reshape((1,image.shape[0], image.shape[1], image.shape[2]))
    image = preprocess_input(image)
    
    cam, yhat = model.predict(image)
    print(cam.shape)
    print(yhat.shape)
    category = np.argmax(yhat)
    return mapping[category]
In [45]:
all_amp_layer_weights=model.layers[-1].get_weights()[0]
In [46]:
all_amp_layer_weights.shape
Out[46]:
(1024, 40)
In [47]:
final_model = Model(inputs = model.input, 
                  outputs = (model.get_layer("ClassConv").output, model.get_layer("class").output))
            
final_model.summary()
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg16_input (InputLayer)     [(None, 500, 500, 3)]     0         
_________________________________________________________________
vgg16 (Model)                (None, 15, 15, 512)       14714688  
_________________________________________________________________
ClassConv (Conv2D)           (None, 15, 15, 1024)      4719616   
_________________________________________________________________
GAP (GlobalAveragePooling2D) (None, 1024)              0         
_________________________________________________________________
class (Dense)                (None, 40)                41000     
=================================================================
Total params: 19,475,304
Trainable params: 4,760,616
Non-trainable params: 14,714,688
_________________________________________________________________
In [48]:
pixels = final_model.input.get_shape().as_list()[1]
pixels
Out[48]:
500
In [49]:
from tensorflow.keras.applications.inception_resnet_v2 import InceptionResNetV2, preprocess_input

def predict(img_path, model, all_amp_layer_weights):
    
    # Load and preprocess image
    img = image.load_img(img_path, target_size=(pixels,pixels))
    
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    
    processed_input = preprocess_input(x)
    
    # Run model's prediction to output from last Conv Layer + category prediction
    last_conv_output, pred_vec = final_model.predict(processed_input)
    # Only 1 training example, so remove batch dimension
    last_conv_output = np.squeeze(last_conv_output)

    # Get category with highest probability
    pred = np.argmax(pred_vec)
    
    scale = pixels / last_conv_output.shape[0]
    filters = last_conv_output.shape[2]
    
    # Rescale to input image sizederstanding was it's reserved for the fi
    mat_for_mult = scipy.ndimage.zoom(last_conv_output, (scale,scale,1), order=1)
    
    # Get the weights associated with the predicted class
    amp_layer_weights = all_amp_layer_weights[:, pred]
    
    # Weighted sum of the activation maps for the predicted class.
    # Then resize back to original image size.
    final_output = np.dot(mat_for_mult.reshape((pixels*pixels, filters)), amp_layer_weights).reshape(pixels,pixels)
    
    return final_output, mapping[pred]
    
In [50]:
import matplotlib.pyplot as plt
import scipy
import glob

from tensorflow.keras.preprocessing import image

# Images downloaded from internet search engine to try out activity classification
# Images are under images/ directory 
image_dir = "images/"
pattern  = image_dir + "*"
test_images = sorted(glob.glob(pattern))

# Details for the grid size
columns = 4
rows = math.ceil(len(test_images) / columns)
fig = plt.figure(figsize=(80, 20 * rows))

for i, image_name in enumerate(test_images):

    # Progress report
    print(".", end = '')
    image_path = image_dir + image_name
    
    ax = fig.add_subplot(rows, columns, i+1)

    # Load and display original image
    img = tf.keras.preprocessing.image.load_img(image_name, target_size=(pixels,pixels))
    plt.imshow(img)
    
    # Run forward pass to get the Class Activity Map and the predicted class
    cam, pred = predict(image_name, model, all_amp_layer_weights)
    
    # Display class activation map
    plt.imshow(cam, cmap='jet', alpha=0.5)
    # Set title to that of predicted class
    ax.set_title(pred, fontsize =60)
    
    
plt.show()
.........................
In [55]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:75% !important; }</style>"))
In [ ]: